ReBNN: Resilient Binary Neural Network

85

E :HLJKWGLVWULEXWLRQRI5H$FW1HW

F ,OOXVWUDWLRQRIZHLJKWRVFLOODWLRQ

D :HLJKWRVFLOODWLRQRI5H$FW1HW

FIGURE 3.28

(a) We show the epoch-wise weight oscillation of ReActNet. (b) We randomly select two

channels of the first 1-bit layer in ReActNet [158]. The distribution is with three peaks

centering around {−1, 0, +1}, which magnifies the non-parametric scaling factor (red line).

(c) We illustrate the weight oscillation caused by such inappropriate scale calculation, where

w and L indicate the latent weight and network loss function (blue line), respectively.

a result, we apply this set of hyperparameters to the remaining experiments in this chapter.

Note that the recurrent model does not affect when τ is set to 1.

3.9

ReBNN: Resilient Binary Neural Network

Conventional BNNs [199, 158] are often sub-optimized due to their intrinsic frequent weight

oscillation during training. We first identify that the weight oscillation mainly originates

from the non-parametric scaling factor. Figure 3.28(a) shows the epoch-wise oscillation4

of ReActNet, where the weight oscillation exists even when the network is convergent.

As shown in Fig. 3.28(b), the conventional ReActNet [158] possesses a channel-wise tri-

modal distribution in the 1-bit convolution layers, whose peaks, respectively, center around

{−1, 0, +1}. This distribution leads to a magnified scaling factor α, and thus the quantized

weights ±α are much larger than the small weights around 0, which might cause the weight

oscillation. As illustrated in Fig. 3.28(c), In BNNs, the real-valued latent tensor is binarized

by the sign function and scaled by the scaling factor (the orange dot) in forward propagation.

In backward propagation, the gradient is calculated based on the quantized value ±α (indi-

cated by the yellow dotted line). However, the gradient of small latent weights is misleading

when weights around ±1 magnify the scaling factor, such as ReActNet (Fig. 3.28(a)). Then

the update is conducted on the latent value (the black dot), leading to the latent weight

oscillation. With minimal representation states, such latent weights with small magnitudes

frequently oscillate during non-convex optimization.

We aim to introduce a Resilient Binary Neural Network (ReBNN) [258] to address the

problem above. The intuition of our work is to relearn the channel-wise scaling factor and the

latent weights in a unified framework. Consequently, we propose parameterizing the scaling

factor and introducing a weighted reconstruction loss to build an adaptive training objective.

4A toy example of weight oscillation: From iteration t to t+1, a misleading weight update occurs causing

an oscillation from1 to 1, and from iteration t to t+2 causes an oscillation from 1 to1.